OW QMIX, CW QMIX, QTRAN, QMIX, and VDN are the state-of-the-art algorithms for solving Dec-POMDP domains. OW QMIX, CW QMIX, QTRAN, QMIX, and VDN failed to solve complex agents' cooperation domains such as box-pushing. We give a 2-stage algorithm to solve such problems. On 1st stage we solve single-agent problem (POMDP) and get an optimal policy traces. On 2nd stage we solve multi-agent problem (Dec-POMDP) with the single-agent optimal policy traces. Single-agent to multi-agent has a clear advantage over OW QMIX, CW QMIX, QTRAN, QMIX, and VDN on complex agents' cooperative domains.
translated by 谷歌翻译